Symbol Ranking Text Compression with Shannon Recodings
نویسنده
چکیده
In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method adds the concept of “symbol ranking”, as in ‘the next symbol is the one third most likely in the present context’. While some other recent compressors can be explained in terms of symbol ranking, few make explicit reference to the concept. This report describes an implementation of Shannon’s method and shows that it forms the basis of a good text compressor.
منابع مشابه
Symbol Ranking Text Compression
In his work on the information content of English text in 1951, Shannon described a method of recoding the input text, a technique which has apparently lain dormant for the ensuing 45 years. Whereas traditional compressors exploit symbol frequencies and symbol contexts, Shannon’s method adds the concept of “symbol ranking”, as in ‘the next symbol is the one 3rd most likely in the present contex...
متن کاملSymbol-driven compression of Burrows Wheeler transformed text
Despite the enormous growth in storage capacity in recent years, the search for fast and efficient text compression algorithms continues. As processor speed is increasing at a higher rate than disk access time is decreasing, there is now even more reason to store information in a compressed form than there was previously. Prediction by Partial Matching (PPM), first published in 1984, was a sign...
متن کاملData Compression Using a Sort-Based Context Similarity Measure
Every symbol in the data can be predicted by taking the immediately preceding symbols, or context, into account. This paper proposes a new adaptive data-compression method based on a context similarity measure. We measure the similarity of contexts using a context sorting mechanism. The aim of context sorting is to store a set of contexts in a speci"c order so that contexts more similar to the ...
متن کاملCan We Do without Ranks in Burrows Wheeler Transform Compression?
Compressors based on the Burrows Wheeler transform (BWT) convert the transformed text into a string of (move-to-front) ranks. These ranks are then encoded with an Ordermodel, or a hierarchy of such models. Although these rank-based methods perform very well, we believe the transformation to MTF numbers blurs the distinction between individual symbols and is a possible cause of inefficiency. Ins...
متن کاملPrediction by Compression
It is well known that text compression can be achieved by predicting the next symbol in the stream of text data based on the history seen up to the current symbol. The better the prediction the more skewed the conditional probability distribution of the next symbol and the shorter the codeword that needs to be assigned to represent this next symbol. What about the opposite direction ? suppose w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. UCS
دوره 3 شماره
صفحات -
تاریخ انتشار 1997